18:22
2026-05-16
research.nvidia.com
large-language-models
iGRPO: Self-Feedback-Driven LLM Reasoning
Researchers introduced Iterative Group Relative Policy Optimization (iGRPO), a two-stage reinforcement learning method that improves large language model reasoning by having the model generate and refβ¦